Deep Learning Bootcamp November 2017, GPU Computing for Data Scientists

Using CUDA, Jupyter, PyCUDA and PyTorch

01 PyCUDA verify CUDA 8.0

Web: https://www.meetup.com/Tel-Aviv-Deep-Learning-Bootcamp/events/241762893/

Notebooks: On GitHub

Shlomo Kashani


In [1]:
# Ignore numpy warnings
import warnings
warnings.filterwarnings('ignore')
import matplotlib.pyplot as plt
%matplotlib inline
# Some defaults:
plt.rcParams['figure.figsize'] = (12, 6)  # Default plot size

PyCUDA Imports

The compute unified device architecture (CUDA) is a heterogeneous sequential-parallel programming model and software environment that allows for access to the NVIDIA’s GPU resources via so-called kernels.

Several programming languages including C/C++, Fortran, and Python are supported for written kernels.

Compared to the other non-scripting languages, Python emphasizes quick development and offers a comprehensive mathematics library that has been widely adopted by scientific communities.

PyCUDA involves using Python as a wrapper to the CUDA C kernels, and features Python’s automatic memory management, error checking, and requires no user-visible compilation, which makes it very suitable for interactive testing and quick prototyping in our applications.


In [2]:
%reset -f
import pycuda
from pycuda import compiler
import pycuda.driver as drv
import pycuda.driver as cuda

Available CUDA Devices


In [3]:
drv.init()
print("%d device(s) found." % drv.Device.count())
           
for ordinal in range(drv.Device.count()):
    dev = drv.Device(ordinal)
    print "Device #%d: %s" % (ordinal, dev.name())
    print "  Compute Capability: %d.%d" % dev.compute_capability()
    print "  Total Memory: %s KB" % (dev.total_memory()//(1024))
    atts = [(str(att), value) 
            for att, value in dev.get_attributes().iteritems()]
    atts.sort()
  
    for att, value in atts:
        print "  %s: %s" % (att, value)


1 device(s) found.
Device #0: Tesla K80
  Compute Capability: 3.7
  Total Memory: 11714432 KB
  ASYNC_ENGINE_COUNT: 2
  CAN_MAP_HOST_MEMORY: 1
  CLOCK_RATE: 823500
  COMPUTE_CAPABILITY_MAJOR: 3
  COMPUTE_CAPABILITY_MINOR: 7
  COMPUTE_MODE: DEFAULT
  CONCURRENT_KERNELS: 1
  ECC_ENABLED: 1
  GLOBAL_L1_CACHE_SUPPORTED: 1
  GLOBAL_MEMORY_BUS_WIDTH: 384
  GPU_OVERLAP: 1
  INTEGRATED: 0
  KERNEL_EXEC_TIMEOUT: 1
  L2_CACHE_SIZE: 1572864
  LOCAL_L1_CACHE_SUPPORTED: 1
  MANAGED_MEMORY: 1
  MAXIMUM_SURFACE1D_LAYERED_LAYERS: 2048
  MAXIMUM_SURFACE1D_LAYERED_WIDTH: 65536
  MAXIMUM_SURFACE1D_WIDTH: 65536
  MAXIMUM_SURFACE2D_HEIGHT: 32768
  MAXIMUM_SURFACE2D_LAYERED_HEIGHT: 32768
  MAXIMUM_SURFACE2D_LAYERED_LAYERS: 2048
  MAXIMUM_SURFACE2D_LAYERED_WIDTH: 65536
  MAXIMUM_SURFACE2D_WIDTH: 65536
  MAXIMUM_SURFACE3D_DEPTH: 2048
  MAXIMUM_SURFACE3D_HEIGHT: 32768
  MAXIMUM_SURFACE3D_WIDTH: 65536
  MAXIMUM_SURFACECUBEMAP_LAYERED_LAYERS: 2046
  MAXIMUM_SURFACECUBEMAP_LAYERED_WIDTH: 32768
  MAXIMUM_SURFACECUBEMAP_WIDTH: 32768
  MAXIMUM_TEXTURE1D_LAYERED_LAYERS: 2048
  MAXIMUM_TEXTURE1D_LAYERED_WIDTH: 16384
  MAXIMUM_TEXTURE1D_LINEAR_WIDTH: 134217728
  MAXIMUM_TEXTURE1D_MIPMAPPED_WIDTH: 16384
  MAXIMUM_TEXTURE1D_WIDTH: 65536
  MAXIMUM_TEXTURE2D_ARRAY_HEIGHT: 16384
  MAXIMUM_TEXTURE2D_ARRAY_NUMSLICES: 2048
  MAXIMUM_TEXTURE2D_ARRAY_WIDTH: 16384
  MAXIMUM_TEXTURE2D_GATHER_HEIGHT: 16384
  MAXIMUM_TEXTURE2D_GATHER_WIDTH: 16384
  MAXIMUM_TEXTURE2D_HEIGHT: 65536
  MAXIMUM_TEXTURE2D_LINEAR_HEIGHT: 65000
  MAXIMUM_TEXTURE2D_LINEAR_PITCH: 1048544
  MAXIMUM_TEXTURE2D_LINEAR_WIDTH: 65000
  MAXIMUM_TEXTURE2D_MIPMAPPED_HEIGHT: 16384
  MAXIMUM_TEXTURE2D_MIPMAPPED_WIDTH: 16384
  MAXIMUM_TEXTURE2D_WIDTH: 65536
  MAXIMUM_TEXTURE3D_DEPTH: 4096
  MAXIMUM_TEXTURE3D_DEPTH_ALTERNATE: 16384
  MAXIMUM_TEXTURE3D_HEIGHT: 4096
  MAXIMUM_TEXTURE3D_HEIGHT_ALTERNATE: 2048
  MAXIMUM_TEXTURE3D_WIDTH: 4096
  MAXIMUM_TEXTURE3D_WIDTH_ALTERNATE: 2048
  MAXIMUM_TEXTURECUBEMAP_LAYERED_LAYERS: 2046
  MAXIMUM_TEXTURECUBEMAP_LAYERED_WIDTH: 16384
  MAXIMUM_TEXTURECUBEMAP_WIDTH: 16384
  MAX_BLOCK_DIM_X: 1024
  MAX_BLOCK_DIM_Y: 1024
  MAX_BLOCK_DIM_Z: 64
  MAX_GRID_DIM_X: 2147483647
  MAX_GRID_DIM_Y: 65535
  MAX_GRID_DIM_Z: 65535
  MAX_PITCH: 2147483647
  MAX_REGISTERS_PER_BLOCK: 65536
  MAX_REGISTERS_PER_MULTIPROCESSOR: 131072
  MAX_SHARED_MEMORY_PER_BLOCK: 49152
  MAX_SHARED_MEMORY_PER_MULTIPROCESSOR: 114688
  MAX_THREADS_PER_BLOCK: 1024
  MAX_THREADS_PER_MULTIPROCESSOR: 2048
  MEMORY_CLOCK_RATE: 2505000
  MULTIPROCESSOR_COUNT: 13
  MULTI_GPU_BOARD: 0
  MULTI_GPU_BOARD_GROUP_ID: 0
  PCI_BUS_ID: 0
  PCI_DEVICE_ID: 4
  PCI_DOMAIN_ID: 0
  STREAM_PRIORITIES_SUPPORTED: 1
  SURFACE_ALIGNMENT: 512
  TCC_DRIVER: 0
  TEXTURE_ALIGNMENT: 512
  TEXTURE_PITCH_ALIGNMENT: 32
  TOTAL_CONSTANT_MEMORY: 65536
  UNIFIED_ADDRESSING: 1
  WARP_SIZE: 32

In [4]:
import pycuda.autoinit
import pycuda.driver as cuda

(free,total)=cuda.mem_get_info()
print("Global memory occupancy:%f%% free"%(free*100/total))

for devicenum in range(cuda.Device.count()):
    device=cuda.Device(devicenum)
    attrs=device.get_attributes()

    #Beyond this point is just pretty printing
    print("\n===Attributes for device %d"%devicenum)
    for (key,value) in attrs.iteritems():
        print("%s:%s"%(str(key),str(value)))


Global memory occupancy:99.000000% free

===Attributes for device 0
MAX_THREADS_PER_BLOCK:1024
MAX_BLOCK_DIM_X:1024
MAX_BLOCK_DIM_Y:1024
MAX_BLOCK_DIM_Z:64
MAX_GRID_DIM_X:2147483647
MAX_GRID_DIM_Y:65535
MAX_GRID_DIM_Z:65535
MAX_SHARED_MEMORY_PER_BLOCK:49152
TOTAL_CONSTANT_MEMORY:65536
WARP_SIZE:32
MAX_PITCH:2147483647
MAX_REGISTERS_PER_BLOCK:65536
CLOCK_RATE:823500
TEXTURE_ALIGNMENT:512
GPU_OVERLAP:1
MULTIPROCESSOR_COUNT:13
KERNEL_EXEC_TIMEOUT:1
INTEGRATED:0
CAN_MAP_HOST_MEMORY:1
COMPUTE_MODE:DEFAULT
MAXIMUM_TEXTURE1D_WIDTH:65536
MAXIMUM_TEXTURE2D_WIDTH:65536
MAXIMUM_TEXTURE2D_HEIGHT:65536
MAXIMUM_TEXTURE3D_WIDTH:4096
MAXIMUM_TEXTURE3D_HEIGHT:4096
MAXIMUM_TEXTURE3D_DEPTH:4096
MAXIMUM_TEXTURE2D_ARRAY_WIDTH:16384
MAXIMUM_TEXTURE2D_ARRAY_HEIGHT:16384
MAXIMUM_TEXTURE2D_ARRAY_NUMSLICES:2048
SURFACE_ALIGNMENT:512
CONCURRENT_KERNELS:1
ECC_ENABLED:1
PCI_BUS_ID:0
PCI_DEVICE_ID:4
TCC_DRIVER:0
MEMORY_CLOCK_RATE:2505000
GLOBAL_MEMORY_BUS_WIDTH:384
L2_CACHE_SIZE:1572864
MAX_THREADS_PER_MULTIPROCESSOR:2048
ASYNC_ENGINE_COUNT:2
UNIFIED_ADDRESSING:1
MAXIMUM_TEXTURE1D_LAYERED_WIDTH:16384
MAXIMUM_TEXTURE1D_LAYERED_LAYERS:2048
MAXIMUM_TEXTURE2D_GATHER_WIDTH:16384
MAXIMUM_TEXTURE2D_GATHER_HEIGHT:16384
MAXIMUM_TEXTURE3D_WIDTH_ALTERNATE:2048
MAXIMUM_TEXTURE3D_HEIGHT_ALTERNATE:2048
MAXIMUM_TEXTURE3D_DEPTH_ALTERNATE:16384
PCI_DOMAIN_ID:0
TEXTURE_PITCH_ALIGNMENT:32
MAXIMUM_TEXTURECUBEMAP_WIDTH:16384
MAXIMUM_TEXTURECUBEMAP_LAYERED_WIDTH:16384
MAXIMUM_TEXTURECUBEMAP_LAYERED_LAYERS:2046
MAXIMUM_SURFACE1D_WIDTH:65536
MAXIMUM_SURFACE2D_WIDTH:65536
MAXIMUM_SURFACE2D_HEIGHT:32768
MAXIMUM_SURFACE3D_WIDTH:65536
MAXIMUM_SURFACE3D_HEIGHT:32768
MAXIMUM_SURFACE3D_DEPTH:2048
MAXIMUM_SURFACE1D_LAYERED_WIDTH:65536
MAXIMUM_SURFACE1D_LAYERED_LAYERS:2048
MAXIMUM_SURFACE2D_LAYERED_WIDTH:65536
MAXIMUM_SURFACE2D_LAYERED_HEIGHT:32768
MAXIMUM_SURFACE2D_LAYERED_LAYERS:2048
MAXIMUM_SURFACECUBEMAP_WIDTH:32768
MAXIMUM_SURFACECUBEMAP_LAYERED_WIDTH:32768
MAXIMUM_SURFACECUBEMAP_LAYERED_LAYERS:2046
MAXIMUM_TEXTURE1D_LINEAR_WIDTH:134217728
MAXIMUM_TEXTURE2D_LINEAR_WIDTH:65000
MAXIMUM_TEXTURE2D_LINEAR_HEIGHT:65000
MAXIMUM_TEXTURE2D_LINEAR_PITCH:1048544
MAXIMUM_TEXTURE2D_MIPMAPPED_WIDTH:16384
MAXIMUM_TEXTURE2D_MIPMAPPED_HEIGHT:16384
COMPUTE_CAPABILITY_MAJOR:3
COMPUTE_CAPABILITY_MINOR:7
MAXIMUM_TEXTURE1D_MIPMAPPED_WIDTH:16384
STREAM_PRIORITIES_SUPPORTED:1
GLOBAL_L1_CACHE_SUPPORTED:1
LOCAL_L1_CACHE_SUPPORTED:1
MAX_SHARED_MEMORY_PER_MULTIPROCESSOR:114688
MAX_REGISTERS_PER_MULTIPROCESSOR:131072
MANAGED_MEMORY:1
MULTI_GPU_BOARD:0
MULTI_GPU_BOARD_GROUP_ID:0

In [ ]:
! jupyter-nbconvert "01 PyCUDA verify CUDA 8.0.ipynb" --to slides --reveal-prefix=reveal.js --post serve --ServerPostProcessor.ip="0.0.0.0"


[NbConvertApp] Converting notebook 01 PyCUDA verify CUDA 8.0.ipynb to slides
[NbConvertApp] Writing 269541 bytes to 01 PyCUDA verify CUDA 8.0.slides.html
[NbConvertApp] Redirecting reveal.js requests to https://cdnjs.cloudflare.com/ajax/libs/reveal.js/3.1.0
Serving your slides at http://127.0.0.1:8000/01 PyCUDA verify CUDA 8.0.slides.html
Use Control-C to stop this server

In [ ]: